104 research outputs found

    Editorial Tag Endogeneity for News Websites

    Get PDF
    Editors and journalists at some news websites label their articles with structure and content-related editorial tags. Each article can have more than one tag and each tag can be used in more than one article. A network of tags can be defined whose edges are all possible pairs of tags in each article. Because editorial tags relate to structure and content rather than individual articles, the analysis of a network of editorial tags could assist editorial decisions to prioritize types of content and articles. In this paper we analyze the network of editorial tags of one of the fastest growing news websites in Portugal, with over 6.1 million visits, 7.6 million page views, and over 1200 editorial tags in 15 months. Standard network characterization reveals a 15.5 average node degree, a 0.794 average clustering coefficient, and a 2.36 average path length, which are indicators of small world and triadic closure effects. We use this tag network to propose endogenous and exogenous models that predict transitions between tags of consecutive article views. The editor can use this tag transition model to prioritize types of articles: articles with endogenous tags to try to promote the reading of articles with similar content, and articles with exogenous tags to try to promote the reading of articles with different content

    A Scheduler for Cloud Bursting of Map-Intensive Traffic Analysis Jobs

    Get PDF
    Proceedings of: Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015). Krakow (Poland), September 10-11, 2015.Network traffic analysis is important for detecting intrusions and managing application traffic. Low cost, clusterbased traffic analysis solutions have been proposed for bulk processing of large blocks of traffic captures, scaling out the processing capability of a single network analysis node. Because of traffic intensity variations owing to the natural burstiness of network traffic, a network analysis cluster may have to be severely over-dimensioned to support 24/7 continuous packet block capture and processing. Bursting the analysis of some of the packet blocks to the cloud may attenuate the need for over-dimensioning the local cluster. In fact, existing solutions for network traffic analysis in the cloud are already providing the traditional benefits of cloud-based services to network traffic analysts and opening the door to cloud-based Elastic MapReduce-style traffic analysis solutions. In this paper we propose a scheduler of packet block network analysis jobs that chooses between sending the job to a local cluster versus sending it to a network analysis service on the cloud. We focus on map-intensive jobs such as string matching-based virus and malware detection. We present an architecture for an Hadoop-based network analysis solution including our scheduler, report on using this approach in a small cluster, and show scheduling performance results obtained through simulation. We achieve up to more than 50% reduction on the amount of network traffic we need to burst out using our scheduler compared to simple traffic threshold scheduler and full resource availability scheduler. Finally we discuss scaling out issues for our network analysis solution
    • …
    corecore